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Introduction 


• Xperf is awesome (see last year's Gamefest talk) 

• Xperf has a "challenging" learning curve 

• Talk goals: 

• Pass on the Xperf lessons learned at Valve 

• Pass on the techniques Valve uses (including sample code) 

• Encourage a common perf-interchange format 

• Force me to learn ETW/Xperf more thoroughly 
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What is Xpert? 


• Free, whole system ETW profiling tool 

• ETW stands for Event Tracing for Windows 

• Disk, CPU, GPU, processes, threads, etc. 

• Includes sampling profiler 

• Used extensively by Microsoft 

• Profiling without Xpert is also known as "guessing" 

• "I think our level loads are bound by I/O time" (there was none) 

• "I think our level loads are bound by CPU time" (wrong again) 
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Stuff Xpert Found 


• 400 ms startup hang on Portal 2 and Dota 2 

• 10 s of static lighting initialization on map load 

• On a game that didn't use static lighting 

• 3 s of wasted time during map load 

• 100,000 unintentional memory allocations 

• Conditional breakpoint accidentally left enabled 

• Excessive assert cost in debug builds 

• Many, many, more 
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How Valve Uses Xpert 


• You can record just system data 

• Sampling profiler, Disk I/O, page faults, context switches, DirectX 
information, memory allocations, etc. 

• But, system data is much more valuable with context: 

• Frame start/frame rate 

• Key events (begin/end task, etc.) 

• Network traffic 

• User input 

• Etc. 

• User providers let you provide this context 
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Context: Graph View 


Some things Valve's user providers tell us include: 
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Common Timeline 


Idle CPU 


The real power is when user 
events are in the same view 
system events 
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Grouping is powerful 
Tinne is recorded for all events 
Input events are awesome 
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How Valve Uses Xpert 






Tracing is always on 

Kernel data goes to a 600 MB circular buffer 

• ~2 minutes of data on a busy 12-proc machine 


User data goes to a 100 MB circular buffer 


• ~2-100 minutes of data 
depending on what 
providers are active 



In-memory circular buffers 


Buffers can be saved to disk after a performance problem is 
noticed 


• Retroactive profiling 
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Demo 


• Load various Xperf traces and show axtual isues found at 
Valve 
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Xpert Compared to Bracketed Event Profilers 


• PIXBegin/EndNamedEvent style profilers coexist well with 
xperf 

• Bracketed Event Profilers: 

• Have lower data rate, for faster data manipulation 

• Make slow frames easier to see 

• Xperf: 

• Shows OS details (other processes, disk, locks, etc.) 

• Shows what happens between the bracketed events 

• Works when there are no events 
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ETW User Provider Definition (Manifest File) 


^provider 

guid="{2B25961D-BA6E-4D79-BEC7-3605366E2E09}" 
name="Multi-FranneRate" 
symbol = "MULTI_FRAMERATE" 
messageFileNanne="%temp%\MultiProvider.exe" 
resourceFileNanne="%tennp%\MultiProvider.exe" 

> 




< 


<tennplates> 

<tennplate tid = "T_FrameMark"> 

<data inType="win:Int32" name="Frame number"/> 

<data inType="win:Float" name="Duration (ms)" /> 

</template> 

</templates> 

<opcodes> 

<opcode name="FrameMark" symbol="_FrameMarkOpcode" value="10"/> </opcodes> 

<tasks> 

<task name="Frame" symbol="Frame_Task" value="l" 

eventGUID="{43DADA85-49B6-4438-83D6-931477635DE3}"/> 

<event symbol = "FrameMark" template="T_FrameMark" value="200" task="Frame" opcode="FrameMark" /> 
</events> 

</provider> 


Provider definition 
Name, location, and GUID 
Can f Event payloads. Available types include: 
in on' • Signed/unsigned 8-bit, 16-bit, 32-bit, and 
64-bit integers 

• ANSI and Unicode strings 

• Float and Double 

• Boolean, Binary, GUID, Pointer, FILETIME, 

- :>T^ii:ivii Static event data 

• Used to aid in interpreting, 
sorting and grouping 



Event definitions 
Ties together payload and 
static data 

Your code emits events 
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Writing an Instrumentation Manifest 


^ Ci\hoinnedepot\Source\MultiPrQvider\et’ivprovider.nfian - Manifest Generator 


• http://msdn.microsoftcorr 

us/library/dd996930(VS.8. 

• You can use Visual Studio 

• Get \Include\Eventman.xsd 
XML->Schemas menu to in 

• You can use Manifest_Gen 
ECManGen.exe) to edit ins 

• From the Platform SDK 

• You can go old school 


File Edit Help 

1^ d d X r hi 


Events Section 
□HO Multi-Main 
S -Q Events 
[j- |3 Templates 

1.3 Levels 

[j-|3 Keywords 

1.3 Channels 

[j- |3 Tasks 
[j-|3 Opcodes 

^.3 Maps 

[jHft Multi-Worker 

Multi-FrameRate 
□■■■Q Events 

^.9 RenderFrameMarl 

S-'O Templates 

^.9 T_FrameMark 

1.3 Levels 

S-'O Keywords 

1.3 Channels 

I Tasks 

^ .9 Frame 

S - |3 Opcodes 

^ .9 RenderFrameMarl 

^.3 Maps 

r^..j|^h Ml il-K_Tririi i-H 


Providers: 

Name 

Multi-Input 

Multi-FrameRate 

Multi-Worker 

Multi-Main 


GUID 

{7OE2503B-C6F3-^7B0-B.323-BD8ED0C61BF8} 
{2B2596ID-B A6E-«79-BEC7-360 5366E2E09} 
<[t9C3D A11-E2A 5-^FD-9CD3-17E7C76C5303} 
{23ICF 5^-22A0-^E^A 59 A-470 52A30FFED} 


Notepad! 
















































Compile Manifest 


• mc.exe -um %(Filename)%(Extension) -z %(Filenanne)Generated 

• Generates: 

• %(Filename)Generated.h 

• %(Filename)Generated.rc 

• %(Filename)Generated_MSG00001.bin (compiled into resource file) 

• %(Filename)GeneratedTEMP.bin (compiled into resource file) 

• Don't check in the generated files 

• Don't forget to build the resource file into your progrann 
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Valve's ETW API 


#ifdef WIN32 

PLATFORMJNTERFACE int64 ETWMark( const char *pMessage); 

PLATFORMJNTERFACE int64 ETWMarkPrintf( const char *pMessage,...); 

PLATFORMJNTERFACE int64 ETWBegin( const char *pMessage); 

PLATFORMJNTERFACE int64 ETWEnd( const char *pMessage, int64 nStartTime ); 

PLATFORMJNTERFACE void ETWRenderFrameMarkQ; 

PLATFORMJNTERFACE void ETWSimFrameMarkQ; 

PLATFORMJNTERFACE void ETWMouseDown( int nWhichButton, int nX, int nY); 

PLATFORMJNTERFACE void ETWMouseUp( int nWhichButton, int nX, int nY); 

PLATFORMJNTERFACE void ETWKeyDown( int nScanCode, int nVirtualCode, const char *pChar); 

PLATFORMJNTERFACE void ETWSendPacket( const char *pTo, int nWireSize, int nOutSequenceNR, int nOutSequenceNrAck); 
PLATFORMJNTERFACE void ETWThrottledQ; 

PLATFORMJNTERFACE void ETWReadPacket( const char *pFrom, int nWireSize, int nInSequenceNR, int nOutSequenceNRAck); 
#else 

// Inline NOP functions for cross-platform compatibility 
#endif 



Gamefest 

2 0 11 


Valve's ETW API Implementation 


Startup/shutdown: 

#include <ETWProviderGenerated.h> 

EventRegisterValve_Network(); // Call this at process startup for each provider 
EventUnregisterValve_Network(); // Call this at process shutdown for each provider 


Implennentation 

void ETWSendPacket( const char *pTo, int nWireSize, int nOutSequenceNR, int nOutSequenceNrAck) 

{ 

static int s_nCumulativeWireSize; 
s_nCumulativeWireSize += nWireSize; 

// EventWriteSendPacket is a macro in the generated header file 

EventWriteSendPacket( pTo, nWireSize, nOutSequenceNR, nOutSequenceNrAck, s_nCumulativeWireSize); 


XP compatibility thunks: 

#define EVNTAPI_stdcall 
#include "ETWProviderGenerated.h" 


ULONG EVNTAPI EventWrite( REGHANDLE RegHandle, PCEVENT_DESCRIPTOR EventDescriptor, ULONG UserDataCount, PEVENT_DATA_DESCRIPTOR UserData ) 

{ 

if (g_ETWRegister.m_pEventWrite) 

return g_ETWRegister.m_pEventWrite( RegHandle, EventDescriptor, UserDataCount, UserData ); 
return 0; 

} 
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Demo 


• Run Visual Studio, load the sample, and build it 

• Register the providers 

• xcopy /y yourgame.exe %temp% 

• wevtutil urn etwmanifestman 

• wetvutil im etwmanifestman 

• Run the sample 

• Record a trace, analyze it 

• Etwrecord.bat myfirsttrace.etl 
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Necessary Sample Customizations 


• Replace all GUIDs in ETWProvider.man to avoid conflicts 

• Rename provide 'name' and 'symbol' in ETWProvider.man 

• Also update etwcommonsettings.bat and etwprof.cpp to match 
new names/symbols 

• Adjust 'messageFileName' and 'ResourceFileName' in 
etwprovider.man, and DLLFileMain and DLLFileAlternate in 
etwregister.bat 
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Technical Challenges 


• 32-bit stack walking is buggy on 64-bit Windows Vista 

• Sampling profiler becomes useless 

• Luckily we use 64-bit Windows 7 

• Xperf/ETW work on Windows XP (with many limitations), but won't 
install on Windows XP 


• Find 32-bit Windows Vista or Windows 7 machine, install there. Copy the 
install image 

Running applications off of non-system drive is busted 

• Use mklink/junction to hack around this 

• mklink /j c:\dota d:\dota 

No easy way to record traces on customer machines 

Working on an installer and Xperf wrapper 


Junction junction, what's 
your function... 
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Process Simplifications 

• Installation from Windows SDK is tedious 

• Make local distribution directory for xcopy install 

• Syntax for recording tracing is byzantine 

• Make bullet proof batch files, put in distribution directory 

• xcopy install is too much work (???) 

• Allow running batch file from network drive 

• Typing batch file parameters is hard 

• Make all parameters optional 

• Users might not register or have user providers 

• Make batch files fallback gracefully if providers aren't registered 
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More Process Simplifications 


• 64-bit stack walking requires setting reg key, rebooting 

• Set reg-key in batch files 

• Get IT to deploy the reg key using group policy 

• Frame Pointer Omission (/Oy) breaks 32-bit stack walks 

• Change all project files to /Oy-, default with VS 2010 

• Developers won't run circular buffer recording 

• Automatically start/stop it when recording traces 

• Creating junctions is annoying 

• Modify batch files to do it automatically 

• Not everyone has _NT_SYMBOL_PATH set 

• Have batch files set it if not already set 
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Ultimate Simplification 


Record Valve Perf Tracej 


Start Tracing Start Full Tracing 


Redistributable Xpert wrapper is possible 

About three days work 

Installs Xpert 

Buttons tor starting and 
stopping circular butter 
tracing, recording traces 

Also registers providers 

Adds global hot key tor 
recording traces 



Record Trace ipi'Vin H-C) 


StoELTradng 


Starting tracing on Windows 7+ 

Initializing providers from C:V^rogram Files (x86)^team^teamapps\corinmon\[STAGING] DotA 2,V 5 in\tierO.dll 
Tracing is started. 

Recording a trace - please wait... 

The trace you have just caplared 'ciV^erneldata.eti'may contain personally identifiable information, including but not 
necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the 
events that were logged. Please be aware of this when sharing out this trace with other people. 

The trace you have just caplured 'c:VJserdata.eti' may contain personally identifiable information, including but not necessarily 
limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that 
were logged. Please be aware of this when sharing out this trace with other people. 

Merging ci'^cerneldata.eti and c:VJserdate.eti to Ci^Users^dminV^ocuments^ValvePerfTraces 
\Val veTrace_2011_08_17_20. . 29. eti 

Merged Eti: C: ^sers^dmin^ocuments\ValvePerfTraces\'y'alveTrace_2011_08_17_2J0. ^. 29. eti 
This trace may contain user input including chat and console commands from Valve games. 

Trace successfully recorded to T:'^sers^dminV^ocumen1s\ValvePerfTraces\ValveTrace_2Dll_08_17_2D.^.29.eti'. 


Traces: 


C : YJsers'^dmin tJocuments\Val vePerfTraces\Val veT race_20 11_08_17_20 . -TO . 29 . eti 


F Fast Sampling 


close 


l\^ JL 































Syrr 


• http:// 

Your sym 

• You do 


If this keeps showing up, then 
create a symsrv.yes file in the 
Xpert install directory 


81% of CPU time in this region is 
in code I don't have symbols for 






rosoftcom/download/symbols 

erver 

e a symbol server, don't you? I* 

^ / I g shaderapidx9.dll 

symstore add /f *.pdb /s \\OurSvmbolServer\s 


Micnosoft: Internet Symbol Store 


Be sure to carefully read and understand the following Terms of Use. You 
must accept the Terms of Use in orderto access or use computerfiles 
from Microsoft Corporation via the Internet. 


MICROSOFT SOFTWARE LICENSE TERMS 
MICROSOFT DEBUGGING SYMBOLS AND EXECUTABLES 

These license terms are an agreement between Microsoft Corporation 
(or based on where you live, one of its affiliates) and you. Please 
read them. They apply to the software named above, which includes 
the media on which you received t. if any . The terms also apply to 
any Microsoft 

■ updates. 

■ supplements. 

■ Internet-based services, and 

■ support services 

forthis softw'are. unless otherterms accompany those terns. If so. 
those terms apply. 

Do you accept all of the terms of the preceding Terms of Use? If you 
choose No. you will not be able to obtain the computerfiles that your 
debugging program has requested from Microsoft via the Internet. 


□ 


Yes 


No 


nbols 

I server 

p! 


Function 

Weight 
[ 6,470.785163 


['! 


i! 609.482 736 

0 CTransitionTable::Find... 

y 563.478 381 

0 UmaDecode 

1 22.001 903 

0 CTransitionTable::Crea... 

10 8.000 924 

0 CTransitionTable::Crea... 

1 5.999416 

0 write_string 

1 2.000 869 

_set_flsgetvalue 

Q 1.000 595 

BlitSurfaceBits 

0 1.000 594 

_output_l 

1 1.000 275 


n, so add build directories to symbol path 

erf, windbg, VS, etc.): 

iygame\bin;SRV*c:\symbols*\\OurSymbolServer\symbols*http://ms 

ad/symbols 
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Xpert as Pert-interchange Format 


• No company is an island 

• Valve uses projects and source from other companies 

• Valve uses DLLs from other companies 

• We need them built with /Oy- and symbols so we can profile our game with 
this foreign code inside it 

• When we hit problems in your code, we want to send you an ETW 
file 

• If you hit problems in our code, we want you to send us an ETW file 

• Common toolset allows sharing techniques and skills 

• Common toolset allows reporting perf-bugs in others' code 
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Xpert for Other People's Software 


• Recording graphics perfornnance problenns, sending traces 
to IHVs 

• Found and reported opportunities for innproved 
performance in PowerPoint, Visual Studio, and Windows 
Live Photo Gallery 

• Used Xperf to profile third-party profiler, and itself 

• Server performance problem due to Windows bug 

• Profiling Valve's games before starting at Valve 

• Poor network perf caused by network driver DPC time 
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Platform SDK 


• Make sure you are getting the latest version 

• 7.1 as of this writing, available at http://msdn.microsoft.com/en- 
us/windows/bb980924.aspx 

• Contains Xpert, and Xpert installers 

• Also contains Manifest_Generator, GUID Generator, eventman.xsd 

• Plus other goodies like Application Verifier, debuggers, etc. 

• Note that with Visual Studio 2010 SPl installed, the Platform SDK 
will fail to fully install 

• To avoid this failure, don't install the SDK compilers 

• If they're needed, you can install them afterward from 
http://go.microsoft.com/fwlink/?LinkID=212355 
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Corrections 


• The undocumented "-capturestate" option to xperf does 
not work on Vista 

• Sorry - you'll have to fix the batch files 

• But tracing works better on Windows 7 anyway 

• And it's not documented 
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Other Xperf Stuff I'd Cover if I Had Time 


• Heap profiling 

• GPUView 

• Finding UI hangs 

• Advanced threading analysis 

• Sunnmary tables, summary tables, summary tables 

• Python script for packaging up .symcache files used by a 
trace 



Gamefest 

2 0 11 


Resources 


• http://randomascii.wordpress.com/categorv/xperf/ 

• Resource links, slides, and sample project are here. Future updates will go here 

• See sample code 

• Uses multiple ETW providers to record game-relevant data 

• Includes batch files for recording traces 

• Readme.txt explains what to do 

• Last year's xperf talk 

• http://www.microsoft.com/downloads/details.aspx7familyid = 14flQc84-8f31-412d-bcdf- 

4fl097bd8b5f&displaylang=en 

• Platform SDK 

• http://msdn.microsoft.com/en-us/windows/bb980924.aspx 

• Writing an instrumentation manifest 

• http://msdn.microsoft.com/en-us/library/dd996930(VS.85).aspx 

• Documentation of all event payload template types 

• http://msdn.microsoft.com/en-us/library/aa382774(v=VS.85).aspx 

• http://www.bina.com/search7q=xperfview 



w.bina.com/search7q = Event+Tracina+for+Windows 
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